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ABoTRACr 

The possibility tbat certain features of items oi a 
reading jOmprehension test may lead to bias;»d estimates of the 
reading achievement of particular subgroups of students i#a.« 
investigated, :te«n response data on the reading comprehension se::tion 
of a freguently used achievement +est were obtained from the An-aor 
Test Study data file?. Eiaht ncnoverl appinc subaroups of students 
were defined by the combinations of thres factors; student grade 
level (fifth or ?ixth; , income level of the neighborhood in whiah ths 
school was located (low or middle/above), and race of the student 
(black 5r white), Eshiraa^es of student abili'-y and itew paraieters 
were obtained ?eparateiy for ei»ch of the eight sabgroups using the 
three- parameter* loaistic »odel, ?ias irdioes were computed based on 
differaaces in item characteristic carves for pairs of subgroups, A 
criterion for labeling an i-^etn as biased was developed using the 
distribution of bias Indices for subgroups of the same race that 
differed only, in income level or orade level. Using this criterion, 
three items were consistently identified as biased in four 
independent comparisons of subcrronps of black and white students, 
CoBpirisoas of content and format characteristics of items that were 
;^ identified as glased with those that were not, or between items 
g£::M.&sed la different directions, did" not lead to t'lie -identification of 
any systeR.atic content differences, ( ?.uthor/M KM) 



#.«i^««fr f ^i«t«i(c«*:tt 3*4' 4«*«*««4t*]«e»*#« 1,1 

* Beproductions supplied by EMS are the best chat can be made ♦ 

♦ from the original document. * 

W« 4t*t#**3te**'i(t4i 4c ************* «4c * **4c**>«'^^ ******** * 



SKSteJSS 



tl 1 Oi^AfttMCHiTor Hi Attn. 

NATIONAL tNSTtTUTf 01^ 
€OUCATtO«« 

^»AU Q DO NOT Np< f S5.AI?U V l^tPUU- 



CENTER FOR THE STUDY OF RBA0XN6 



Technical Report No. 163 

AN INVESTIGATION OF IT2M BIAS 
IN A TEST OF READING COMPREHENSION 

Robert L. Lian, Michael V. Levine, 
C. Nicholas Hastings, and James L. Wardrop 

University of Illinois at Urbana-Champaign 

March 1980 



University of Illinois 

at Urbana-Champaign 
51 Gerty Drive 
Champaign, Illinois 61820 



Bolt Beranek and Newman Inc. 

50 Moulton Street 

Cambridge. Massachusetts 02138 



The r«se«r<sli reported herein was supported in pert the the Naticmal 
Institute of Education under Contract No. US-NIE-.C-400-76-0116. The 
authors thasik William Tlrre for his help with the data prepaiatioR 
and analysis. 



2 



i 




Itra Bias 



1 

Abstract 

The possibility that certain features of Ittms on a reading comprehension 
test may lead to biased estimates of the reading achievement of particular 
subgroups of students vas investigated^ Item response data on the reading 
coo^rehenslon section of a frequently used achlevament test were obtained 
from the Anchor Test Study data files* Eight nonoverlapplng subgroups of 
students were defined by the combinations of three factors: student grade 
level (fifth or sixth) > income level of the neighborhood In which the school 
was located (low or middle/above)^ and race of the student (black or white)* 
Estimates of student ability and item parameters were obtained separately for 
each of the eight subgroups using the three-*parameter logistic model « The 
ability scales were then equated across pairs of subgroups andt in any 
comparison of a pair of subgroups an item was considered to be biased to the 
degree that the probability of getting an item right differed from one 
Subgroup to the other whi^n ability was held constant (l*e.t the degree to 
which the item characteristic carves (ICCs) differed). Bias Indices were 
computed based on differences In ICCs for pairs of subgroups* A criterion 
for labeling an Item as biased was developed using the distribution of bias 
indices for subgroups of the same race that differed only in income level or 
grede levels Using this crlterlont three items were eonslatently Identilled 
as biased in four Independent conparlsons of subgroups of blac:k and whlti^ 
students* Con^arisons of content and format characteristics of Items that 
were Identified as biased with those that were not^ or between Items biased 
in different directlonst did not leed to the identification of any systematic 
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contetkt differences'^ The study did provide strong support for the viability 
of the estimation procedure. Some suggestions for improvements in 
methodology are offered* 
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An Investigation of It*m Bl«« In • Test of 

Reading Conprehenslon 

Controversy over mental testing has a history that dates baak almost to 
the Introduction of large-scale testing in World War I (Cronbach» 1975). 
The possibility that tests underestlnate the conpetence of identifiable 
groups, particularly the poor and menbers of certain racial and ethnic 
minorities, has been a recurrent issue in the ebb and flow of controversy. 
The charge that standardized tests are biased against certain subgroups is a 
familiar one. The statement that a test is biased has many different 
meanings t however* 

Bias is sometimes claimed as the natural consequence of the fact that 
tests are culture-dependent* Certainly, performance on a test in English Is 
an unreasonable basis for making claims about the **verbal ability" of a 
child who speaks and reads only Spanish. Such a claim would not only be 
•*biased"; it would be patently absurd. However, the test may provide a 
reasonable indication of the child's current competence In English. Thus, 
It is much more meaningful and potentially fruitful to speak of possible 
bias in the interpretation and use of test results rather than bias in the 
test per se* 

A cottBon use of tests is to predict some future behavior such as job 
pierformance or success in school or college* For the predictive use of 
^^•sts, the isaue of possible blaa revolves around the question ef ifhetlier or 
not identifiable sub-groups perform better on the job or In college than 
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would be predicted from their test scores (Ansstasl, 1976| Clesry, 1968; 
Linn, 1973; Petersen & Novlck» 1976). 

Prediction is one of the uses made of achievement tests* but It Is by 
no means the only use. More often achievement tests are used to assess 
current status, to evaluate programs, and to diagnose problems. For the 
non-predictive uses of achievement tests, strategies for assessing possible 
sources of bias have generally focused on the internal characteristics of 
the test. The goal Is to identify non-essential characteristics of test 
items that result in the misinterpretation of the achievement of certain 
groups of students. For example, reading is a skill that is incidental to 
the one that is purported to be measured by a mathematics achievement test. 
Dependence of the test results on reading ability could lead to a biased 
Indication of the relative con5>etence in mathematics for two groups that 
differ In reading ability. ^ 

If items on a test differ In their dependence on the characteristic 
that is incidental to the skill being assessed, then the biasing effects of 
that incidental chsracteristic would be expected to result In an interaction 
between the items and the characteristics of the examinees. In other words, 
the magnitude of group differences in performance would be expected to vary 
as a function of the extent to which items were dependent "on the incidental 
characteristics. Once identified, the offending items could be' revised or 
replace~3 In an effort to ellmlnat^ their biasing effects. 

the Idea of searching for item characterlstles ctitt Ititeract with group 
membership in order to reduce possible bias Is not new. For example, the 
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•t«t«d purpotc of the landMrk study by Sells » Davis, Havlghurst, Herrick, 
and Tyler (1951) was to "Identify (a) those kinds of test problems on which 
children fron high i|ocloecononic backgrounds show the greatest superiority 
and (b) those kinds on which children from low socioeconooic backgrounds do 
relatively well" (p. 6)» Interactions between item content and sex were 
investigated by Coffman (1961), and a number of studies have been conducted 
to identify types of items that are unusually difficult for members of 
minority groups (e.g., Angoff & Ford, 1973; Cleary & Hilton, 1968). 

One of the limitations of the early studies of item-group interactions 
is that they relied upon sanple'-dependent item statistics. There is no 
sound theoretical basis for expecting a constant difference in the 
proportion of people in two groups that respond correctly to various itens* 
A second limitation of definitions of item bias that depend on differences 
in the proportion correct for two groups is that proportion correct is 
confounded with other item characteristics such as item discriminating power 
(Hunter, Note 1). The difference in proportion correct for two groups can 
be expected to vary from Item to item solely as a function of differences in 
the discriminating power of the items. Thus, as stated by Warm (1978), "the 
use of classical test theory item parameters is inappropriate for, and can 
lead to erroneous identification of item bias" (p. 128)* 

lord (1977a, 1977b), Scheuneman (in press), Wright (1977), and others 
have suggested that latent trait theory provides a theoretically sounder 
•pproacii to tli« problem of Identifyii^ items that interact with group 
membership than can be achieved using item statistics based on classical 
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tMt th«ory. S«v«r«l rec«nt studlM (e.g., Han», 1978} Ironson & 
Subkovlak, 1979; Rudner, 1977) have compared Indices of item bias based on 
late^it trait theory with Indices from several earlier approaches. It Is 
clear that the earlier approaches, based on statistics used In classical 
test theory, are not substitutes for an approach based on latent trait 
theory • 

The primary advantage of an approach based on latent trait theory is 
that, to the extent that the model holds, the item parameters should be 
Invariant. That is, they should not depend upon the sample of people on 
which the estimates are based. Thus, except for sampling error, the same 
estimates would be expected for different groups even though the groups may 
differ substantially in ability level. 

This study has two major purposes, one of which is methodological In 
nature and the other substantive. Refinements are needed in the techniques 
used to detect items that lead to biased estimates of the ability of a 
particular group. The analyses conducted for this study were intended to 
provide some evaluation of an approach based upon a particular lat »nt trait 
model and contribute to the development of better methods of using latent 
trait models to detect Items that result in biased ability estimates. 

The substantive purpose of this study Is to investigate the possibility 
that certain features of items on a reading coa^tehension test may lead to 
biased estimates of the reading achievement level for blaek students as 
compared to white students and/or for children attending sGhooXs in low* 
Income neighborhoods as coo^ared to those attending schools In middle- or 
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high-lncoM neighborhoods. The Identification of item that lead to such 
eatlnates would be of particular value If the Item so identified could be 
characterized by some generalisable features that could be used as a guide 
in constructing and editing reading con^rehension tests to nininise bias 
against particular subgroups of students* 

StratsRV for Identifying Bias 
Birnbauin's (1968) three'-parameter logistic model was used to obtain 
estimates of ability and of the item parameters in all of the analyses 
reported below. The LOGIST computer program (Wood, Wlngersky* 5r Lord, 
Note 2) was used to estimate the item parameters and abilities of the 
students* 

According to the three-parameter logistic model, the conditional 
probability P^(0) that a person randomly chosen from all those with ability 
9 will answer Item 1. correctly, is 



(1) P^(e) - Cj + 

1 + expl-1.7a^(» « b^)] 

»^» and jCj are item parameters. Thus, each item is characterised 
by thrca parameters : the ''item discriminating poirer,** the location or 
**dlff Iculty" of the item, J^, and the lower asymptote or probability that 
persons with extremely low ability will respond correctly to the item, c* 
The graph of P|9) as a function of 0 is called the item characteristic curve 
(ICC) for item J.* According to the model, the probability of getting th« 

.9 
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lt«n right it conpletely determined by 0 end the three it«B parenetere* 

More apecif ically , members of different groups with equal ability (i.e., 
equal ©) should have the same probability of getting an Item right. In 
other words, the conditional probabilities, Pj(»), and their graphs* should 
be invariant from one group to another. 

We approached identifying itens that function differently for members 
of different subgroups by comparing ICCs that were estimated separately for 
different subgroups. If the ICCs of some items differ from group to group 
more than would be expected due to sampling error, then such Items may be 
considered biased: the probability of getting an item right is not equal 
for persons of the same overall ability who come from different subgroups. 

Such bias may be the consequence of multidimensionality . That is, the 
probability of getting an Item right depends on more than one latent trait 
(i.e.. more than one ©) and the groups differ in their distributions of the 
secondary latent traits (Hunter. Note 1). Multidimensionality may still be 
considered a form of bias, however, in that it can lead to apparent 
differences in the primary ability when, in fact, there ere no such 
differences. 

Procedures 

Data for the analyses reported below were obtained from the Anchor Test 
Study (Bianchlni & Loret, 1974) equating study files. Item response data on 
the heading Comprehension section of form F of the Metropolitan Achievement 
tests (Durost, Bixler, Wrlghtstone, Prescott, 6, Balow, 1970) were obtained 
for students in grades 5 and 6. Data were available for a total of 15,485 

lo 
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flfth-grad* and 1 4, 843 aixth^grAde •tudcnts. At Mch grad^ Uval, allghtly 
ov«t 16X of th« students vlth avsilsble dsta were black and somewhat over 
76X were white. All analysaa reported below are based on these two groups 
of students within esch grade. 

The sanple of students waa divided into eight subgroups. The subgroups 
were defined by grade (fifth or sixth), by race (black or white), and by 
incone level of the neighborhood in which the sample school waa located (low 
or middle/high). The analyses were baaed on all black students for whom the 
necessary Item response data were avallsble. Analyses for white students 

■ft 

were based on spaced sainples containing roughly the aane number of students 
as were In the black student samples attending low income schools^ Listed 
In Table 1 Is the number of students within each subgroup upon which the 
parameters of the item characteristic curves were estlmatede As can be 
seen» group size was roughly 2000 per subgroup for all but the subgroup of 
black students attending aiddle** or hlgh^-^lncome schoolSa Hie latter was 
considerably smaller^ containing approximately 22% as many students » on the 
average, as the other three subgroups at each grado level* 

Insert Table 1 about here 



Under the assutdption that the threa--»parameter logistic model holds for 
all subgroupa, the estlmted abllltiea should bn on essentially the same 
scale regardless of the group used to obtain the estimates* the asswption 
implies that the different subgroups can differ only by a linear 
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trftntformatlon from on* aubgroup to anothar. Thua, it la poaulbla to equata 
the Bcalea by maans of llnaar tranaf ormatlon and then mv:ke meaningful 
compariaona of the ICCa for different aubgroupa. 

The procedure used to find the linear tranaf ormation to equate the 
acalea for palra of aubgroupa ia baaed upon the property of the model that 
item difficulties and latent ability estimatea of the exaolneea are 
expressed on the sane acale* (See Lord> 1977b, for discuaaion of related 
latent trait theory methods.) In other words » what<»ver transformation is 
appropriate for the b's is also appropriate for the ©'s and vice veraa. 
Since the b's were estimated for the same items for all subgroups, the 
distribution of the b's should be the same except for sampling error after 
the scales are equated. 

The specific steps followed to equate the scales of two groups irere as 
follows. First, one group waa arbitrarily identified as the "baae" group 
and the other as tb* "conparlaon" group* Ihti acale of the base group waa 
left unchanged (i.e., no transformation waa made of the d'a or item 
parametera for the base group). Two constants, A and B, ware then found 
such that the weighted mean and variance of the tranaformad b'a of the 
compariaon group were equal to the weighted mean and variance of the baae 
group. More specifically, if b*^ is the item difficulty of Item 1. In the 
comparison group after equating and ^ is the correaponding value prior to 
equating then 

(2) b^* - A + Bb^ 

is 
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where A and B are the equating conatanta aelected such that the weighted 
nean and veriance of in the comparison group are equal to the weighted 
mean and variance of the original b's in the base group. The 1.— weight was 
the inverse of the larger of the estimated variances of the coiiftuted from 
the comparison group and the b^ computed from the base group. Thus items 
for which the difficulty parameter was poorly estimated (i.e^, had a large 
estimated sampling variance) for either of the groups were given relatively 
less weight in determining the equating constants than were items for which 
the difficulty parameter was better estimated. Detailed formulas used in 
estimating the variances and covarlcnces of the errors of estimate for the 
item parameters and for approximating the standard error of a point on an 
estimated item characteristic curve are provided in Appendix A and the 
detailed formulas used to obtain the equating constants (I.e., A & B) are 
provided in Appendix B* 

Once the A and B in Equation 2 were obtained » the con^arison group 
ability estimates and estimates of Item discrimination were converted to the 
base group scales In particular the transformed d scale^ say 9*, for the 
comparison group Is given by 

(3) e* - A + BO, 

and the transformed a.*^s, say by 

No transformation of the c^ parameter estimates is required* 
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After the estimated abilities and ite«i paramvt^rs of a coopariton group 
were transformed to the base group scale, several types of coiq>arisons were 
made* Item characteristic curves for each group we:re plotted on the common 
scale and compared^ In order to better evaluate vhitther observed 
differences in ICCs were attributable sluiply to sampling error » the standard 
errors of the ICCs were estimated and ICCs plus and minus two standard 
errors of estimate were obtained and plotted for each group « 

Indices of Bias 

In addition to the comparison of the ICCs and "confidence bands" 
determined by their standard errors, several indices of item bias were 
computed. Three of these indices were described by Ironson (Not.» 3), (see 
also Ironson & Subkoviak, 1979). They involve areas between the ICCs of a 
base group and a coiq>arison group* Sums of squared differences between ICCs 
were also computed* In all, eight bias indices (four weighted and four 
unweighted) %iere con^^uted (See J^ipendix C for details) « (Some discussion of 
the desirability of weighting indices according to the stability of the 
estimates of the ICCs at various levels of 6 will be provided at a later 
point*) Thus, only the sia^ler unweighted indices are described here* 

The four unweighted bias indices used for the results reported below 
are as follows* 

1* Base High Areas the size of the region between 9 » -3 and 9 • +3 in 
\^ich the base group ICC is above the comparison group XCC« 

2* Base Low Areat the site of the region between 9 «» -3 and 9 • ^3 in 
which the base group ICC is below the coiq^arisQn group ICC* 
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3* Absolute Dlffereticet the sum of 1 and 2« 

4* Square Root of the Sum of Squares: the square root of the sum of 
the squared differences between ICCs in the region of © - -3 to 0 • +3* 

An Item with a large base high area (Index I) but small or zero base 
lev area (Index 2) vould be considered to be biased against the comparison 
group « Such an outcome would Indicate that persons In the comparison group 
have a smaller probability of getting the Item right than persons in the 
base group with equal estimated ability* The direction of the bias would be 
just the opposite for an Item with a large base low area but zero or small 
base high area* The bias In an Item with large base high and large base low 
areas would depend upon the distribution of ability In the groups of 
examinees contrasted* 

Estimates of Item parameters that were obtained separately for the 
eight subgroups defined by grade level, race, and Income level of the school 
were used to make a total of twelve palrwlse comparisons « In each pairwise 
coii^arlson> the base group and the comparison group differed ln^only one of 
the three characteristics used to define the subgrmips^ l^us^ there were 
four independent comparisons of the different levels of each group 
characteristic with constant levels. of the other tw group characteristics* 
For examplet for a fixed grade and Income level of school, coiq>arl8ons were 
made across racial groups ^ so that four cooparlsons ^rt m^e of black and 
Whlti' sttidencs (fifth- or slxth-^griders from lower-income or middle/higher- 
.4fic«me schools) ♦ Similarly » income level comparisons wei^e made each of 
four race-*by-grade combinations » and grade comparisons were made for each of 
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four t«ce<-by«^school income combiti«tlon8« The base group and conpari8on 

group In each of the twelve cooparlsons are listed in Table 2* 



Insert Table 2 about here 



Results 

v. 

The Item parameter estimates prior to equating of the 0 scales are 
listed In Appendix D for each group* These estimates along with the 
equating constants reported in Appendix E may be used to compute the ICCe on 
the scale used in any of the comparisons for any itein« 

A general Indication of the conparabillty of the ICCs for the 12 
pairvise coiH)arlsons is provided by the distributions of the square root of 
the sum of squares bias indices* When an item has very similar ICCs for two 
groups ^ the index should be near zero* Distributions of the bias index 
values for the 45 items are shown for all twelve pair-vise comparisons iti 
Figure 1. The top four distributions provide comparisons of grade 5 with 
grade 6 holding race and income level constant « The middle four 
distributions provide income level con^arisons holding grade and race 
constant t and the bottom four distributions show the results of the racial 
group cos^arisons with grade and incoine held coiuitant« The gtoup 
characteristics that are held constant for a given distribution are 
identified by the letters and numbers above each histogram^ for exampley 
the left-hand histogram ih the first row of Figure 1 is the grade coiiq>arison 
fot vhlte scudctnts attenclicg scljools in low-lticoae neighborhoods and la 

16 



It en Bias 

15. 

denoted LW. Another example Is the M6 over the right-hand hlscogram n the 
bottom row of Figure !♦ M6 denotes that the racial comparison in the lower 
right-hand histogram is for slxth^grade students attending schools in 
middle- or high-income neighborhoods. 



Insert Figure 1 about here 



An Immediate observation that can be made from an inspection of 
Figure 1 is that there are fewer large values of the bias index for the four 
comparisons involving only white students tlian for any of the other 
comparisons » that is, the comparison of ICCs across grade for white students 
(the two left-hand distributions in the top row of Figure 1) or across 
income level for white students (two left-hand distributions in the middle 
row of Figure 1). Only one of the 180 bias indices is as large as .2 for 
these four distributions. None is as large as .3. 

Items with indices less than .2 have quite similar ICCs* Some 
indication of the degree of similarity is provided by the plots shown in 
Figure 2 for two items.. The plots in Figure 2 compare the ICCs for fifth- 
grade white students attending schools in low-income neighborhoods (IMS) 
with their sixth-grade counterparts (LW6). Item 6 has the second largest 
irttdetx (square root of the sum of squares bias indW equals •161) of atiy ojC 
Jthe A5 items* Tha Index for Item 18 is .070, ffhlch la clos«r to the stean of 
•076 for the 45 itaos. 
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Insert Figure 2 about here 



The three solid lines show the ICC» the ICG plus two standard errors of 
estimate^ end the ICC nlnus two standard errors of estltnate for LW5 
students, and the three dashed lines show the correspondlns curves for the 
LW6 students. The ICCs In Figure 2 are strikingly similar. This provides 

rather strong support for the claim of invariance. Even the item with the 

J. 

largest sum of squares bias index has ICCs with confidence intervals which 
overlap substantially throughout most of the range of ability. This 
evidence of invariance of the parameters over grade level and income level 
for white students strengthens the case for using ICC coo^arisons to 
identify items that result In biased estimates for particular subgroups. 
The distributions of Indices for the four pairwlse con?>ari8on8 of white 
subsamples also provide a base rate against which the indices for other 
pairwlse coo^arisons can be evaluated. 

Returning to Figure 1, it can be seen that the black subsamples provide 
less evidence of invariance across either grade level or income level. 
Comparisons involving middle-income black subsamples might be expected to 
show less invar ian<'.e because the estimates are all less stable due to the 
•mailer sample 9iiee»* The comparison of black fifth-graders attenditig 
schools m lofw-in^owe neighborhoods aB5) with black slxth-gradats attending 
schools in low-income neighborhoods <LB6)» however, involves sample sizes 
conjparcbie to the white subgroup comparlBone. Yet four of the items have 
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indices of *1 or larger for the LBS vs* LB6 t^onparison. A plot of the ICCe 
end the ICCs plus and minus tvo standard errors for the item with the 
largest index in the LB5-LB6 comparison is shown in Figure 3* Item 35 has 
an index of ^256 for the LB5*^:.B6 cofq>arlson» As can be seen in Figure 3> 
the ICCs show greater divergence for these two groups than was observed for 
the LW5-LW6 cos|>ari6ons illustrated in Figure 2^ The separation of the 
ICCs^ however t occurs mainly for 9 values of 2 or above where there are 
relatively few examinees in either group* The fact that the ICCt especially 
for the base group, is poorly estimated for 9 values greater than 2*0 is 
indicated by the divergence of the upper and lower bounds for the ICCs* 
Considering that Item 35 is the most discrepant of the 45 items in the 
LB5-iB6 comparison and that the difference occurs only at values of 9 
greater than 2, one might still argue that the ICCs are generally quite 
similar for the comparisons of black students at different grade levels* 



Insert Figure 3 about here ^ 



The comparisons of primary interest in Figure 1 are, of coursSt those 
between white and black subgroups of students since it is there that the 
presence of biased items is most suspected* The last row of Figure 1 shows 
the distributions o^ the square root of the sum of squares bias iitdex for 
the four pairwiae cosfMtrisona batwaen iMibgroufis of %^lta stud^ts and 
rubgroups of black students* Large Indices are clearly observed with 
greater frequency in the four comparisons in the last row of Figure 1 than 
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In the across grade or income level comparisons for white students. Only 
occ««ionally are the indices for the racial group comparisons more extreme 
than they are for the wi thin-race comparisons for black students. 

Using a cutoff of .2 to indicate a possibly biased item, one would so 
identify 13 of the 45 items in the 1.W5-LB5 comparison and 7 Items in each of 
the other three coiqparisons between racial groups. The number of items 
identified as possibly biased obviously depends on the stringency of the 
criterion employed. But the ICCs corresponding to the largest indices are 
markedly different. 

The agreement among the four independent between-race comparisons 
regarding the identification of items as possibly biased is far from 
perfect. On the other hand, the agreement is considerably better than would 
be expected If items were randomly identified by the four Independent 
comparisons. Using the above criterion, three of the items were Identified 
in all f<Jur pairwlse conparisons. If an equal number of items had been 
selected at random in each cotn>arl80n the probability that an item would be 
selected all four times la only .00109. Thus, the expected number of items 
that would be identified four times by a random process is only about *05 
(I.e., 45 X .00109). The expected distribution of number of times an Item 
would be selected by a random process in the four independent comparisons is 
tihmm in Table 3. Also provided in Table 3 Is the corresponding observed 
distribution. ,The top three categories (l«e«» where an item was identified 
as biased 2, 3, or 4 times) were collapsed so that the expected frequency 
was greater than 5 for each category (0, 1, and 2 or more) and a Chl-square 

2n 



Itmm Bl«« 
19 

statist ic vlth 2 degrees of freedom was computed to test the goodties8<»of«- 

flt% The resulting Chi-square wes 12»i3> which is significant at the .01 
level. The agreement is clearly better than vould be expected on the basis 
of chance* 



Insert Table 3 about here 



Table A provides additional information about the agreement among the 
four comparisons in the Identification of possibly biased items using a .2 
cutoff for the square root of the sum of squares bias indices. The 
ag^reement between each pair of independent comparisons is shown in Table 4. 
Also listed in Table 4 are the phi-coefficients and Chi-square statistics 
corresponding to the two--by-two contingency tables. With the exception of 
the low-income grade 5 (L5) vs. the middle-income grade 5 (M5) comparison, 
the phi coefficients are all significantly different from zero at the *05 
level • 



Insert Table 4 about here 



Ope final Indication of the consistency of the bias indices across 
independent comparisons .of white and black students is provided by the 
product fmoiMfnt correlations between the square root: of the sum of squares 
,M«s indi^es«\^ These correlations are reported in T«bl« 5#. With 45 itwm^ a 
correlation of .3 or greater is significantly different from sero. The 
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correlations involving compArltons that differ only In Incone level of the 
groups (e.g.» L5 with M5) or only in grade levtl (e.g., L5 with L6) are all 
significantly different from sero. Correlations based on coiq>arl8on8 that 
differ both in Incoine and grade level (e.g»» L5 with M6), while positive, 
are not significant. 



Insert Table 5 about here 



An attempt was made to improve the bias Indices by weighting the 
differences between the ICCs by the reciprocal of the estimated standard 
error of the difference between ICCs at each © value (see Appendix C for 
computational details). It was reasoned that a weighted index would lead to 
the appropriate discounting of differences between ICCs in regions of 0 
where one or both of the ICCs were poorly estimated. However, results for 
the square root of the weighted sum of squares bias indices were quite 
similar to those for the unweighted Indices using either index: Three items 
were identified as biased In all four of the independent racial group 
comparisons. Furthermore, the same three items were so Identified with 
eith^ index. For this reason we have chosen to report results only for the 
simpler unweighted indices to conserve space. As will be discussed belo^, 
however, there are reasons to think that the idea of tpelghting is a good one 
and that ia^reved bias indices may be tieveioped tisiiig more refined 
estimating and weighting procedures. 
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Th« ICCa for the three items that were identified aa poaaibly blaaed in 
all four con^ariaona ualng the aquare root of the aum of aquarea bias index 
are ahown in Figurea A» 5, and 6. Each of these figures preaenta four pairs 
of ICCa plua and ninua two standard errora of eatinate for a alngle. item* 

♦ 

In each figure the solid lines are the ICCs plua and minua two atandard 
errora for the white aample and the daahed linea are the comparable figurea 
for the black sample at the aame grade level and income level of 
neighborhood. 



Insert Figures 4, 5, and 6 about here 



From an inspection of Figure 4, It is apparent that the four 
independent comparisons show a great deal of conaiatency. In each 
compariaon the ICC for the white students is above that of the black 
students for low and mid-range valuea of «. Item 3 la leaa dlacrlalnatlng 
(amaller value for j») for white than for black atudenta in each of the 
coii4>ariaona, and therefore the ICCs croaa and the one for black atudenta la 
above the one for white atudenta at high valuea of ©• Although the 
direction of biaa depends on the value of 0, Item 3 la generally blaaed 
againat black atudenta in the region where the majority of the black atudent 
aaaipla falls (i.e., belcw a value of 0 equal to tha mean of the white 
atudent .aaiq»la}» If more itema with ICCa alnilar to thoae of Item 3 ware 
added to the teat, the teat performance of ooat black atudenta would appeat 
worae than it currently doea in conpariaon to white atudenta* On the other 
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hand, ellMnatlon of Iten 3 would tend to laq>rove th« rtlatlvt standing of 

black students. 

Item 25» which in depicted In Figure 5» haft large bias Indices In all 
four comparisons. The large bias indices are brought about largely by the 
very poor discriminating power of Item 25 for black students. Item 25 is a 
difficult Item for sll subgroups. It discriminates well among high-ability 
sixth-grade white students. The discrimination of Item 25 for high-ability 
black students, however, is problematic. The estimates are poorly 
determined due to the small number of black students with ©'s in the region 
where It«m 25 seems to be most discriminating. This poor estimation of the 
discriminating power of Item 25 Is Illustrated by the wide confidence bands 
for the ICCs for black students In three of the four cases. In the fourth 
case (Figure 5d), the ICC for black students Is essentially flat throughout 
the -3.0 to +3.0 range of 9. The estimated value of the item discriminating 
power is so small (i - .01) that the ICC at © - -3.0 is essentially equal to 
the ICC at 0 « +3.0, 

The results in Figure 6 for Item 31 illustrate a situation that is 
different from that for either Item 3 or 25. Tlie pairs of ICCs are quite 
similar for low values of 9 but for higher values of © the curve for black 
students is above the one for whites in all four of the comparisons. Thus, 
If anything. Item 31 vould be eontidersd biassd in favor of black students 
ralative to other iteas on the test* Inelusion of more items like Item 31 
vpuld tend to lif»rove the relative standing of black students on th# test« 
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For conpttratlv« purposes* th« ICCs for Itens 3, 25> and 31 «ra thown in 
Figure 7 for the between»grade conperleon for white etudente attending 
school* in low<- income neighborhoods. As can be seen» Items 3 and 31 have 
confidence Intervals that overlap substantially for the two groups 
throughout the -3.0 to i-S.O renge of ©. For Itew 25» the confidence 
intervals generally overlap, but show some divergence around 9 equal to 0. 
As might be expected from an inspection of Figure 7 » Item 25 .has one of the 
larger sum of squares bias indices. Indeed, the square root of the sum of 
squares bias index for Item 25 is .181, which is the largest value for the 
45 items in the LW5-LW6 comparison. Although item 25 has a somewhat flatter 
ICC for the LW5 sample than for the LW6 sample, the difference is not nearly 
as great as the differences for the white and black samples shown in 
Figure 5. 



Insert Figure 7 about here 

The contrasts that are found between groups for the items in Figures 4, 
5, and 6 may be summarized by the four bias indices confuted for each of the 
contrasts. In order to facilitate comparisons, the indices for the 45 items 
were first rankM>rdered with a rank of 1 given to the item with the highest 
value of a particular index for. a given contrast* The rank ordarlng was 
obtained s,eparately for each index and each contrast, the rank order, of the 

bias indices for the three items in Figures 4, 5, and 6 are listed in 
mie 6. 
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Insert Table 6 about here 



Iteio 25 has relatively large base high bias indices In all four of the 
independent racial group comparisons. Indeed, in three of the four 
compariaons» Item 25 has the largest or second largest base high bias index. 
The white sao^le was used as the base group and the black sanpl^ as the 
comparison group in all four racial group comparisons. Thus, a large value 
of a base high bias index implies that the ICC for white students tends to 
be above the ICC for black students. The large base high bias indices for 
Item 25 accurately reflect the fact (seen in Figure S) that the ICC for 
white students Is generally above the one for blacks. The relatively 
smaller, but nonzero, base low bias indices for Item 25 reflect the fact 
that the ICCs cross in all four comparisons. Item 31, on tht other hand, 
has either the largest or second largest base low bias Indices but 
relatively small base high bias Indices in each of the coiq>arisons • 

Item 3 has base high and base low bias indices that generally rank 
among the highest third of the items. Thus, the relatively large overall 
indices reflect a combination of moderately large base high and base low 
differences due to the crossover of the ICCs in all four conparlsons. 

Items 25 and 51 ar« probably the two «o»t clearly coAtrsratltig it«Wi in 

terms of the racial group dlflstreaces In. ICC*. It em 15 was eonslst^ntly - - 

identified as biased against black students while Item 31 waf consistently 
identified as biased in favor of black students. The iteiw are of quite 
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different types* Item 25 asks the meaning of the word "character" as it is 
used ia one of the reading passages on the test* Item 31, on the other 
hand, asks for the •'best title" of a story about a fictional baron presented 
in another passage* 

There are eleven itens that ask the ineaning of a word as used in a 
passage and five items that ask the "best title" of a story. The rank order 
of the base high bias index and the base low bias index is listed in Table 7 
for the word meaning and "best title" items for each of the four racial 
group comparisons (see Appendix F for a complete listing of bias indices for 
all comparisons) # The simple comparison of these two types of items does 
not reveal a clear tendency for one type to be biased against black students 
and the other biased In their favor* With the exception of Item 31, the 
"best title" items have few high ranks on either of the Indices* In 
addition to Item 25, "character," Iteofi 2, "there," 27, "reigning," i.^ 
"setting," and 42, "speculate," tend to have fairly high ranks on the base 
high bias Index* Some of the other word meaning items, however, have 
relatively low ranking base high bias indices and inay even rank higher on 



Insert Table 7 about here 

the base low bias index (e*g«. Item lS| "Test")^ Thtis, ganiiralisations 
Jased on such surface--level characteristics of the items do not S|sem 
warranted* 
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Cumulative Effect 

Although the analyses have not led to clear generalizations regarding 
the content or structural characteristics of Iteias that result in bias^ 
there Is strong evidence that the ICCs of at least a few items are not 
constant for groups of white and black students. For some items the ICC for 
white students tends to fall above the ICC for black students and for others 
the reverse is true. The overall impact of the difference in ICCs on the 
total test score depends upon the particular mix of items on the test and 
the degree to which positively blaoed items are balanced by negatively 
biased Items « The overall effect on total score was evaluated in two 
closely related ways. First, test characteristic curves (Lord b Novlck> 
1968^ p. 386) were computed on the equated 9 scale separately for white and 
black students in*feach of the four racial group comparisons « Secondly, 
expected observed score frequency distributions were cotnputed separately for 
white and black groups at selected points on the equated O scale* 

The pairs of test characteristic curves (TCCs) for the four racial 
group cos^arisons are shown In Figure 8a A TCC for one group that is above 
that for another at a particular value of © implies that the cumulative 
impact of differences in ICCs yields an overall bias against the members of 
the group with the lower TCC who are at that 0 value* Although the curves 
^^shoir a great d*Al of alsdlarlty^ there is a tendency, for the TCC for white 
students to be as high or higher than the one for black students ^ suggcyiting 
a slight cumulative bias against black students* 
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Insert Figure 8 about here 

The difference in TCCs varies as a function of For example, in the 
tW5-LB5 coBipariaon (section a of Figure 8)^ the curves are almost identical 
for low values of say » < At these low values of 6 the test does not 

discriminate very well for either groupt but there is no systematic bias. 
For higher values of 9 the TCC for white students is higher than the one for 
black students* The difference in the TCCs for LW5-LB5 is *021 at 9 • 0, 
♦ 027 at 0 » 1^ and *027 at 9 • 2* Translated into number of Itenfi right on 
the 45-ltem test, these differences would Imply a raw score difference of 
between .95 and 1.22 points > respectively. Similar comparisons of the pair 
of TCCs in the other three sections of Figure 7 suggest that up to about one 
raw score point difference between the scores of white and black students 
may be attributable to the cumulative Impact of group difference in ICCs» 

Although one raw score point Is only about one-eighth of the group 
difference In mean scores on the Metropolitan Achievement Tests ^ even this 
amount Is non-trivlal* At some points on the scale, one raw score point 
would translate into about a tenth of a grade equivalent unit* 

The second analysis that was conducted to evaluate the cumulative 
impact of differences in ICC was the con^^utation of expected tew soore 
fr^'v distributions for black students and white iitu4ent;« «t selected 

points on the equated 9 scale^ As would be expected, the results of this 
analysis are quite consistent with the test characteristic curve results^ 
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They merely provide an alternative way of considering the cumulative effect. 
Therefore only one pair of expected raw score relative frequency 
distributions Is presented here. Figure 9 shows the expected distributions ' 
for fifth-grade white and, black students attending schools In low-Income 
neighborhoods. The distributions were computed for » ■ 0 using the Item 
characteristic curves separately for each group to estimate the probability 
that persons from that group with © • 0 would get each Item right. As can 
be seen, the distributions are very similar except that the LW5 distribution 
is shifted up approximately one raw score point relative to the LBS 
distribution. 



Insert Figure 9 about here 



An alternative explanation of the results In Figures 8 and 9 Is that 
there is a systematic error in equating the ability scales. That is, if the 
equating constants, A and B In Equation 2, were changed, the TCC In Figure 8 
a id the distributions of expected raw scores In Figure 9 could be made to 
coincide more precisely. The two possible explanations cannot be 
distinguished. Indeed, the method Is not really designed to detect bias 
that Is found consistently in all Items. Rather, It can only be expected to 
Identify Items that are biased In one direction or th* other telative to 
other Items on the test. Thus^ an equating procedure that laade the TCC* us 
comparable as possible Is probably to be preferred- This alternative 
approach Is currently being Investigated. 
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The analysiftii Involving comparisona of students at different grade 
levels or who attet^d achools located ij> neighborhocda with different iticome 
levele shoved that the XCCs were generally very similar^ For example» the 
ICCs baaed on a sample of fifth-grade white students attending schools In 
low-incciae neighborhoods were almost indistinguishable troa those for a 
sai^ple of their slxth^^grade counterparts^ Tlie results^ showing a high 
degree of stmliarlty between ICCs for the within-^race comparison involving 
differences in the other two grouping variables lend c^^ dence to the 
viability of the general approach^ A baslef assumption of the latent trait 
niodel is that the Item parameters^ and therefore the ICCs, are invariant 
over different groups of people* Thus, the remarkably good invariance of 
the ICCs over grade level and income level within racial groups suggests 
that the model is reasonable for the 45 items on the test that was analyzed^ 

The degree of Invariance in the ICCs was noticeably less for the racial 
group con^parlsons than for either the grade level or Income comparisons « 
This suggests that there are some items that function differently for black 
students than they do for white students ♦ Such items may reasonably be 
labeled as biased « Whatever the cause of the difference in the ICCSt the 
effect of including a larger or smaller number of items where the ICC of f>ne 
group is above that of another is the same* The relative standing of black 
students would be higher, on a test that had fewer Itemi? where the tCC for 
white students was above the one for black students* 
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AUhough a few ttcroa were consistently identified as biased in each of 
the four independent comparisons, the consistency of identification at 
different grade levels and/or different income levels was far from perfect. 
For example, using the criterion that the square root of the sum of squares 
bias index yas greater than .2, seven items were identified as possibly 
biased in the comparison of low-income white students in grade 6 with low 
income black students in grade 6. Of these seven items, 7, 3, and 4 were 
also identified as possibly biased in the other three racial group 
comparisons (i.e., LW5-LB5, MW5-MB5, and MW6-MB6. respectively). Only three 
items were identified as possibly biased In all four comparisons. The 
modest amount of agreement among the Independent comparisons suggests that, 
at least for the test studied. It Is apt to be difficult to Identify items 
that are clearly biased. 

Although the ICCs were substantially different for white and black 
students for a few of the items in one or more of the comparisons, the 
overall impression is that the ICCs were generally quite similar. 
Furthermore, the direction of the bias for the few items that showed a 
consistently large difference was not always against black students. One of 
the three consistently Identified items was, if anything, biased, in favor of 
black students. 

Co]Qpaif>i8ons of the content and format characteristics of iteiu that 
were identified as biased with those that were not* or between Items biased 
In different directions, did not lead to the identification of any 
systematic differences. For example, items asking the meaning of a word in 
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context sooetinee appeared to be blasecl iti one direction and eonetlmee In 
the other • Thus» no generalized principles that would be useful In avoiding 
Items that tend to show bias can be stated for guiding the future 
construction of tests of reading comprehension* Instead^ only a po3t**hoc 
analysis procedure that may be useful in ellmltiatlng biased items after the 
Items have been administered can be offered* 

Analysis of the cumulative impact of the difference In the ICCs 
suggests that these differences might be i48ed to explain about one point of 
the gap between raw score means for white and black students^ This 
difference may be an artifact of errors In equating^ however ♦ Thus^ it 
seems desirable to explore alternative equating procedures* We are 
currently Investigating a procedure that will solve for the constants used 
for the linear equating of the ability scales such that the differences 
between the test characteristic curves are minilni^ed♦ 

There are important advantages in the use of coi!f>arisons of ICCs such 
as those in this study over apprpaches that simply compare estimated item 
parameters* It is possiblet as was sometimes observed in our analyses^ for 
item parameters to be substantially different, yet for there to be no 
practical difference in the ICCs* This can occur^ for examplet where the b 
parameter Is estimated to be exceptionally high for one groups To 
illustrate this, con«ldet the following pairs of hypothetical item 
^^rametera for two groups in* terms of a 4;^<mmaii A acalet gr<mp 1^ a * 1%8» b^ 
- 3,5, and c - ^2; group 2, s. ' k ' 5.0, and c - .2. The iteo 
difficulties and discriminating powers for the two groups are markedly 
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different* But the difference in the ICCs Is never greater than .05 for 0 
values between -3 and +3. Thus, the suggestion of bias based on a large 
difference in estimated item difficulty or discriminating power might be 
misleading. The value of practical concern is the difference In the 
probability of getting the item right for people of equal ability from 
different groups. This is» of course, precisely the difference in ICCs. 

The use of estimates of the standard ^errors of the ICCs seems 
potentially useful. By plotting bands of two standard errors on either side 
of the ICCs, it became evident that some seemingly large differences in ICC 
curves were occurring only in regions where one or both of the ICCs being 

i 

compared were poorly estimated « The advantages of using estimated standard 
errors, however, were not very apparent In terms of a comparison of the 
weighted and unweighted bias Indices* It may be that better estimation 
procedures are needed for this purpose. 

One problem that may limit the utility of the standard errors as they 
were estimated in this study is caused by the tendency for the LOGIST 
estimated abilities of some subjects to diverge* To deal with this problem^ 
the ability estimates were arbitrarily limited to a range of plus and minus 
4*0* For some of the groups sizeable numbers of students had ability 
estimates at the lower extreme^ For example, 44 of the MBS sample students 
had estimated 6 of *A*0* This artificial clustering of aubjetta at the 
extreme results in estimated atandard errors of tfhe I€€ at low ab^ llty 
levels that are too small. That Is, the Inflated number of examlBees at the 
extreme makes it appear as If there Is more information at' that ability 
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level than would be the case without the need to fix bounds on 9. In future 
analyses we plan to deal with this problem by estimating standard errors 
after deleting examinees with extreme 0 values or by using estimated ability 
distributions^ 

Despite the limitations noted above and the fact that the reaulta did 
not lend themselves to making generalizations about features of items that 
result in biased estimates of achievement for members of a particular 
subgroup, there are still some noteworthy resiilts from the study* The 
results provide strong support for the reasonableness of the three-parameter 
model for data of this kind* The across grade level coii|>ariaons revealed 
strikingly similar item characteristic curves* The procedures used for 
placing confidence bands around the item characteristic curves yielded 
reasonable results, which, with refinements such as those suggested above, 
hold the promise of substantially ic^roving the basis for comparing item 
parameters and item characteristic curves. 
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Appendices 

Procedure for approximating the stand&rd error of a point on an estimated 
Item characteristic curve* 

Procedure for estimating equating constants. 
Procedures for estimating item bias indices. 

Item parameter estimates and standard errors for each subgroup prior to 
scale equating. 

Estimates of equating constants. 

Bias Indices for each pairwlse comparison. 
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Appendix A 

Procedure for Approximating the Standard Error of a Point on an 
Estimated Item Characteristic Curve 

A.l Motivation and Notation 

A plausible measure of the extent to which an item has different 
characteristics for different groups is 

b 



(1) [ [P(x) - P*(x)l2dx . 



Here P(x) is the estimated ICC evaluated ability x for one group and P*(x) 
for the other group. 

The comments that follow are equally applicable to measures of the form 

b 

I iP(x) - P*(x)| dx 
a 

n 

I (P(x ) - P*(x.))2 
1-1 ^ ^ 

n 

I iP(x.) - P*(x^)| . 
1=1 ^ ^ 

A problem with (1) is that it will be strongly Influenced by the least 

reliable parts of data. More specifically, if the statistics P(x) and P*(3c) 

have large sampling errors, then the diffarence between tbese independent 

jtatiatlcs will tend to be large too, Conse^tieiitly, the , leaat-well-estimated 



values of P(x) and P*(x) are expected to make a relatively large contribution 
to (1), and a confounding between Item unfairness and estimation error is 
likely* 
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One way to improve upon (1) Is to consider introducing weights w(x) 
that would control the contribution at ability level x to the neasure and 
give a fonnula of the form 

b 

(P(x)-P*(x))2 w(x) dx . 



The goal is to give relatively large weight to those values of x such that 

" is well estimated and small weight to values of x such that the 

difference is poorly estimated. In particular we consider 

where a(x) Is an approximation of the standard error of the difference 
P(x) - P*(x)^ To use (2), an approximation of a(x) is needed. 

Since P(x) and P*(x) are estimated from different groups (and therefore 
independent statistics), 

o2(x) - Variance (P(x) - P'^^'x)! 

- Variance [Pfx^j + Variau-e {P*{x)J . 
Therefore, It will be sufficient to develop an approximatxcr* for Var {P(x)] 
and Var{P*(x)J separately. 

To do this, a more explicit notation is needed. Let P(a, b, c^; x) be 
the general three-parameter curve evaluated at x,> i.e., 

P(a, b, c; x) • c + (l-cXl + cxp - {a<x-b)3>"l ^ 
We restrict attention to a particular itam, say the first, and let a, b, c 
denote the "true" parameter values. Let 4, £, t denote their aaxlmunt. 
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likelihood estimates (MLE's)^ and P be the MLE P(a» b, c, x). Using Taylor's 
formula, we obtain the linear approximation 

(3) P i P(a. b, c; x) + (a-a) Pj (a, b, c; x) 

+ (b-b) P2(a, b. c; x) + (6-c) ?^ik, b, c; x) » 
where Pj - ^^.(a» b, c; x) and Pj, P3 are the other partial derivatives* In 
the sequel we use this approximation to extlmate a(x). 

A* 2 Rationale for an Approxlipatlon of the Standard Error of P 

At this time the theory for Itetn parameter estimation is not sufficiently 
well developed to precisely specify the conditions under which the maximum 
likelihood estimates are consistent and asymptotically normal. In this applied 
paper we shall assume that these yet-to-'be-speclf led conditions have been met 
and that the parameter estimates obtained from LOGIST are approximately normal 
with covar lance matrix given by the Inverse of an Information matrix (Kendal 
& Stuart^ 1967) • In this case ft-a, S-b, jc-c will be approximately multi- 
variate normal with zero expectation and with a covarlance matrix obtainable 
by Inverting the information matrix. All of the other terms on the righthand 
side of <3) are constants Thus P is approximately equal to a linear function 
of multivariate normal random variables and therefore normal* 

To approximate the constants on the right of (3)> we first note that 
* * • 

£i 3L) makes no contribution to the variance of P and can be ignored. 
To estimate the partial derivatives, we replace the parameter values by thdlr 
estimates and approximate 
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PjU, b, c; x) by PjU. b» c; x) , 
P2(a, b, c; x) by p2<*» 6; x) , 
PjU. b, c; x) by p3(a, b, c; x) . 
To approximate the covarlance matrix for 4-a, b-b, £-c, we consider a 
3x3 matrix !_ » (jt^^) which will be shown to be an approximation of an infor- 
mation matrix. The typical entry, say 2i2» in this matrix is computed as 
follows. Let 0j be the ability of the jj— examinee and be its maximum 
likelihood estimate. I^jj given by the formula 

I ^llib^^s ti - P(a, b, c; e^)] 

J 

where P » P(a, b, c_; 0^) and _9 « 1 - P. 

The rationale for this formula is obtained by regarding each answer 
sheet or vector of item responses as the outcome of a two-stage experiment, 
lu the first stage an ability 9 is sampled. In the second stage the vector 
of item responses is generated as the outcome of sequence of Bernoulli trials. 
Thus, the probability that the examinee answers item 1. correctly is 

• « * 

S^* ^j)' <Thls experiment differs from the usual conceptualization 
of latent trait data only in that abilities 9j are regarded as random vari- 
ables rather than parameters^ ) 

Relative to this experiment, the information matrix for the item 
parameters will consist of zeros except for 3x3 matrices along the diagonal. 



There will be one such 3x3 matrix for each item. Since the inverse of such 
•-^ matrix will by another **bldclt diagonal" matrix consisting of the inverses 
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of the original 3x3 matrices along the diagonal, we can restrict attention 
to a single item and return to the problem of approximating the typical term. 
Relative to the two^stage experiment » the typical information matrix term in 
the first block (i.e.» for item one) is 

2 ^ u 1 u 

" ^ slab io«TrP(a» b, c; G^)"lJ QU, b, c; e^) '"^^ , 

where ujj is the Item score random variable for the first item and j^th 
examinee, where N is the number of examinees, and 2 " 1 - P. The symbol 
denotes expectation, in this case with respect to both item scores and 
abilities. 

The expression (5) can be rewritten as a sum of two terms: 

- f aTabj^^ ^ij ^» ®j> 

» J ~i 

Compatlng the expectation of the first term gives 

~N I P<a. b, c; e) jllg-iog P(a, b, c; 9) dF(e) , 

where F Is the (unknown) ability distribution function for the N identically 
distr ibuted ©j's. If 1 te approximated by th^ distribution which takes a 
Step of size 1/N at each 0j (i.e., by the sample cumulative distribution of 
Jthe (unobserved) abilities), then we obtain 



P(a, b, c; @j) iJgp log F(a, i, c* e^) , 
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Finally, the approximation (4) is obtained by replacing a, b, £ by a, b, 
and by its maximum likelihood estimate 

A. 3 Comp utational Details 

The actual computation of the covariance matrix conformed to the pro- 
cedure Just outlined, except for some minor exceptions. In computing terms 
in the information matrix by (4), examinees who omitted the item of interest 
or for whom LOGIST failed to converge were ignored* 

The covariance matrix was approximated by inverting the information 
matrix. The approximation of the variance of ^ was obtained from (3) , the 
covariance matrix, and the usual formula for the variance of a sum of cor-- 
related variables* 



16 
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Appendix B 

Procedure for Estimating Equating Constants 

Let bjj^ be the LOGIST estimate of the difficulty parameter for item 1 
in the base group and b^2 ^® corresponding value in the comparison group 
Let b*2 1^^® difficulty of item i in the comparison group after 

equating. The b* are obtained by a linear transformation of the b,^* 
Specif ically> 

^2 - A + Bb^j , 

where A and B are the equating constants. The value of A and of B was com- 
puted such that the weighted mean and variance of the b*^ equal to the 
weighted mean and variance of the h^j^- 

The weight for item i, w , > was obtained as follows. Let V^, and V,^ be 

"'i -^il *n.2 

the estimated sampling variances of b^^ and b^^ respectively (see Appendix A 
for procedures used to estimate the variances) ♦ The weight for item i, is: 



«1 



if V > V 
12 " "ll - "12 



The were used to compute the weighted mean of the b^^ (b^) and the 
b^2 (**2^* Simllarily weighted standard deviations of the b^'s were computed 
in each group iS^ and S^^* "^^^ equating constants were then computed from 



and 



■i7 
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Appendix C 

Procedures for Estimating Item Bias Indices 

The comparison group scale was first equated to the base group scale 
by a linear transformation (see Appendix B). Weights for the weighted bias 
indices were based on an estimate of the standard error of the difference 
between the ICCs for the comparison and base groups. Areas and weighted 
areas between ICCs as well as weighted and unweighted sum-of-squares dif- v 
ferences were computed between 8 « -3 and © « 3. 

The areas were approximated by dividing the distance between e » -3 and 
© - 3 into 600 equal intervals. The distance between the ICCs at the niddle 
of each interval was multiplied by the length of the interval to approximate 
the area between the ICCs for that interval. Areas were then summed either 
before or after weighting for the appropriate indices* i.e., base high, base 
low, and absolute indices. The sum-of-squares indices were computed in a 
similar fashion except that the distance between the ICCs at the center of 
each interval was first squared. The specific equations and computational 
procedures are given below. 
Let 



and 



% - -3.0 



■ ©j^i .01 for j 



- 1,2, ... 600 * 



The midpoint of the j^th interval is 

» 0j -.005 . 
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L«t P^^ be the height o., the ICC for the item iti question \^hen evaluated at 

e using the estimated item parameters for the base group and let P be the 

corresponding value for the comparison group* Finally, let V and V be 

-pjl -pj2 

the estimated variances of the ICCs at for the base group and comparison 
group respectively {see Appendix A). 

The four bias indices that were used in the results reported in the 
text were: 

I^ * base high area, 

I2 " base low area, 

1^ * absolute difference, and 

square root of sum of squares. 

Four weighted bias Indices were also computed for each item. These were: 
» weighted base high area, 

* weighted base low area, 

» weighted absolute difference, and 

* square root of weighted sum of squares. 

Detailed results are not reported for the weighted indices since they did 
not prove to have clear advantages over the simpler unweighted indices for 
the data analyzed for this report* The bias indices were obtained as shown 
below. All summations are for ^ « 1 to 600. 

> (.01) I s^n- , 



- (.01) I 6^ , 

W2 - (.01) [ S-J (l-6j)Dj . 

- + , and 



where - P^^ - P^^ » 



1 If Pj^ > Pj2 » 
0 if P^, > Pj, , 
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Appendix D 

Item Parameter Estimates and Standard Errors 
for Each Subgroup Prior to Scale Equating 
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Appendix E 
Estimation of Equating 



Constants 



B A 

Base Group Comparison Group Slope Intercept 

LW5 LBS 1.01091 -0. 92508 

LW5 MW5 0.88857 0.45232 

LB3 MB5 1,05854 0.47350 

MW5 MB5 1.13246 -1.00965 

LW6 LB6 0.95032 -0.97707 

LW6 MW6 0.79275 0.36811 

LB6 MB6 1.03924 0.57221 

MW6- MB6 1.22931 -1.03913 

LW5 LW6 1.00590 0.38167 

LBS LB6 0.97725 0.34253 

MW5 MW6 0.88891 0.32543 

MB5 MB6 0.98476 0.36079 



b* = A + Bb. 
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Bia.^ Indices for Each Pairwise Comparison 
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0*20976 


0.01437 


20 


0.09U2 


0.03295 


0.12406 


0.00424 




0.01190 


0.06190 


0.07379 


0.00181 


*> > 


0.00000 


OvU105 


0.13105 


0.00484 


211 


0.02852 


0.02595 


0.05448 


0.00072 


2U 


o*09n« 


0.07000 


0.16157 


0.00674 


25 


0* 19246 


0.20515 


0. 39760 


0.04252 


2^ 


0*00000 


0ai479 


0. 114 79 


0.00308 


> ? 


0.07259 


0.07118 


0.14377 


0.00479 


2S 


0.0J620 


0.03317 


0.04936 


0.00059 


29 


0*00000 


0.06421 


0.06421 


0.00092 


}0 


0.0570^ 


0.03365 


0.09071 


0.00255 


31 


0. mtb 


0.09347 


0.19772 


0.00929 


12 


0.00000 


0.05173 


O.OS173 


0.00055 


i:i 


0-04132 


0.09544 


0.13676 


0*00414 




0.11695 


0.li2?5 


0.22970 


0.01497 


35 


0.00496 


0.06939 


0.07435 


0*00104 


36 


0.03146 


0*07552 


0.10698 


0.00335 


3^ 


0.00000 


0*12807 


0.12807 


0.00371 


3» 


0^04232 


0.05W 


0.10019 


0*00318 


39 


0.00000 


0*13334 


0.13334 


0.00383 


40 


0*00000 


0.15431 


0*15631 


0.00706 


41 


0*00000 


0*21989 


0*2198^ 


0*01769 


42 


O.OiOU 


0*07130 


0*11141 


0.00236 


43 


o.ooooo 


0*16153 


0*16153 


0.00791 


44 




0^02886 


0.13660 


0*00707 


4^ 


0.0575* 


0.11106 


0. 16856 


0*00589 



Item Bias 
60 



I. ft S - M B S 







Low 


Absolute 


ltt>ot Sum 

XJ k \ V 


1 


0,05910 


001127 


0.07037 


0. 00106 






0O6615 


0.19273 


0. 00815 


3 




0. 25>37 


0.27441 


0*02044 


4 




0.00000 


0.12754 


0* 00804 






0.29277 


0. 36792 


0.03404 




0.02468 


009054 


0ai522 


0,00341 


7 


0. U498 


Oa02l8 


0* 24706 


0.01400 


8 


0.0037$ 


0.26766 


0.2/141 


002557 


9 


0.16492 


0,18164 


0. 34656 


002562 


lO 


O.069IO 


0.00338 


007247 


0*00139 


11 


0.03901 


0.14960 


0* 18861 


0.01003 


11 


0.18995 


0.00658 


0.19653 


0*02407 


u 




005008 


0.06569 


0. 00087 




0.00990 


0.15271 


0. 16262 


0.01104 


IS 


0. 


0.11142 


0.26285 


0*02130 




0.01992 


0.21884 


0*23876 


0.01507 




0.340^3 


0*12421 


0. 46474 


0*07278 






0. 12^^81 


0. 15129 


0. 00426 




0,00172 


0.15049 


0.15220 


0.00604 


20 


0.1 69? 7 


0.14174 


0.31151 


0.02514 


>l 


0.16361 


0.06965 


0.23326 


0.01037 


22 


0.05755 


0.17188 


0*22943 


001697 


2) 


0.09010 


004133 


0.13142 


0.00385 




0.07607 


0. 104 75 


0.18081 


0.OO795 


2^ 


0.07555 


003572 


Oail26 


0. 00254 


?<> 


0.0033O 


009940 


0. 10270 


0*00218 


27 


0.06449 


0.21853 


0.28301 


0* 03712 




0.J4457 


009500 


0* 33957 


0*03545 


2^ 




0.15383 


0*21210 


0*01602 


30 


0.00975 


0. 20437 


0*2141! 


0.00855 


H 


0.22 704 


0.01287 


0*23991 


0. 03575 


32 


0. 19338 


003990 


0. 23328 


0*01584 


33 


0.07230 


0.07987 


0*15217 


0. 00560 


34 


0. 08108 


0.00659 


0*08767 


0. 00440 


3S 


0.1427* 


0.17410 


0*31686 


0*02490 


36 


004208 


0O0337 


0. 34546 


0*06506 


3? 


0»02H7 


009687 


0- 11 804 


0*00584 


36 


0.22244 


0-06712 


0.28955 


0* 02120 


3^ 


0.07507 


0. 35034 


0*42541 


0*05497 


40 




0.t89«§ 


0*21059 


002958 




009717 


0*00000 


0* 3971? 


004394 




0*23391 


0,001^0 


0*23581 


001653 


43 


0. 30295 


0» 23412 


0* 53707 


009745 


44 


009873 


002555 


0.12428 


0^.00356 


.45 




0*00000 


5.42755 


0O7444 



Item Bias 
61 



M W > - M B 





Are* 


!)if f*re«rc 






M. 00099 


0. 0^^596 


0.00169 




0* 02*92 


0. 33 762 


O»05087 




Oi 04990 


0. 5?7t3 


0.06612 




0» 05782 


0^10045 


0.00244 


0. 01 902 


0* 12318 


0*14219 


0. 005 74 


v« v/oOOO 


0*00000 


0* Ofevoo 


0*00097 


0* 2209^ 


0.01476 


0. 2^ 5^ 9 


0*02579 


0-08^69 


0*00809 


0.09276 


0.00306 


0. 36S27 


0.03280 


0.40107 


0.03669 


0» 134 


0. 00000 


G. 13493 


0.00534 


0* 09226 


0^ 01 


0* I075S 


0.00375 


(?• 0$?57 


0» 03757 


0^ 13513 


0.00375 




0^ 04230 


0*04580 


0*00046 


0» 01337 


0.0144^ 


0.02783 


0.00034 




0. 07040 


0* 179} N 


0.01 141 




0» 032^5 


0* 09376 


0. 001 86 


0* i0620 


Q*000Ct> 


A A. Alt '4 A 

0* 40^20 


0.03108 


0» 10725 


A /W\AAA 

ii» 00000 


0^ !0*25 


0. 00331 


0» J8746 


0. 0000 V 


0. 1 8746 


0.01 1 71 


0* 23600 




0-.25691 


0.01957 




0*04432 


A An*v*>o 


0. 002 36 


0» ITOOT 


V* 0091 1 




0*01v/9 








A Al *>A/ 


0-11913 


0*01472 


0+ 13385 


0. 00479 






V* 


0. I^z2v 








A A^ 1 










y. ?> 




A tA*k|ft 




0. 06065 


0.08015 


0* 14079 


0*00525 


0*00000 


0.156^0 


0*15690 


0.00632 


0. 1[?645 


0.28633 


0.46278 


0.056*1 


0*09372 


0.10220 


0.19591 


0.00910 


0-OU07 


0.04850 


0.09256 


0*00213 


0.08812 


0.02471 


0.11283 


0.00S24 


1-08299 


0.14120 


1*22418 


0.64073 


0-09269 


0.00296 


0*09565 


0.00266 


0*03431 


0.02023 


0*05454 


0*00082 




0. OS 198 


0^46964 


o*o«m 


0.04086 


0.01010 


0*05096 


0.0006^ 


0.23860 


0*02917 


0*26777 


0*03166 


0. 33312 


0.00000 




0.03863 


0*29580 


0.281H 


0.57714 


0*06461 




0*00000 


0*28796 


0*02465 


0.05940 


0.07661 


0*13600 


0*00510 


0. 17755 


0.00000 


©♦17755 


0*4)0793 



Item Bias 

62 











Root S^fti 
















0.00000 


0.09098 


0.00213 






0,00230 


0. 257^9 


0.0257? 




0. 35^2 


0. 155n 


0.53792 


0.05544 




0. IM91 


0.01497 


0.16677 


0.01361 




0. U0?4 


0.01635 


0.15709 


0.00565 






o.on«3 


0.24238 


0. 02367 


T 




O^OlO^l 


0. IhQ^X 


0.0U23 


a 




0.04202 


0. 30540. 


0.03373 






0.07543 


0. 502 ?3 


0.05559 


If) 


0.03059 


0.01368 


0.04427 


0. 00053 


u 




0.00339 


.0. 11207 


0.00420 


\: 


0. 


0.21451 


009592 


0.03540 




0 . 00000 


0.17502 


0* 17502 


0.00670 






0.02152 


0, 08764 


0. 00304 


i 


0 . 00 ! ^0 


0.20270 


0. 20419 


0*01911 


?*» 


0.06 4 no 


0.00000 


0. 06400 


0.00128 


1? 




0.12155 


0. 39753 


0.03826 






0.02424 


0.09759 


0.0025! 


I- 


>M39 


0.00053 


0.21091 


0.01870 






0. 02624 


0. 13505 


00 7 79 




V)OZZ 


o.oo?n 


0.10733 


0.00385 






0.21909 


0.46741 


0*04934 




o.o<^n^ 


0.07291 


0. 12423 


0.00377 






0. 108^4 


0. 14573 


0* 00587 


>^ 




0. U933 


0.64263 


0*12330 


^^ 


0.0?;35 


0.JH97 


0.23431 


0.01415 


^ ? 




0.00000 


0*25342 


0.C1828 








0. 20222 


0*01526 






0,06U2 




0.00550 






0.00700 


0.23054 


0^01494 


n 


0. 075^9 


0. 4003^ 


0.47595 


0*07124 




0.03S74 


0.07953 


0. ne27 


0.00434 


1^ 


0*00000 


0.UH8 


oaina 


0.00327 


u 


O.IHOI^ 


0.00066 


0.18100 


0.0172b 


)^ 


o.on% 


0.01572 


0.02868 


0.00023 


lb 


0.19462 


0.2127* 


0^ 40760 


0*04362 


)? 




O.O3«>06 


0.13500 


0.00461 


31* 


0^33 J 20 


0.02233 


0.05353 


0.00091 


3^ 


0*095Sd 


0.0540f 


0.14964 


0*00«i4 








2151*1 


0^01352 


41 




0.08499 


0.0»844 


0.00441 




0*15873 


0% 02014 


0, 178^17 


0.00776 




0.25132 


0.07922 


0.33053 \ 


0.04229 




0. 20052 


0. U5^2 


0. 33*04 


0.02879 


45 


0- 1d*44 




0*U70« 





G.i 



Item Bia 
63 



!. W t. - M il'e 









Absolute 


Root Sum 










of S<ju.ir** 




0.0942^ 


0. 00000 


0.09425 


0. 00 260 


• 


0.07309 


0^00687 


0.07995 


0.00265 




0.00000 


0.27560 


0. 27560 


0.01413 




0.12294 


0.01781 


0.14075 


0. 00?25 




0.06350 


0.08982 


0.15332 


0. 00473 




O.IOUO 


0.00061 


0.10170 


0.00313 


7 


0,00488 


0.17471 


0.17958 


0. 01430 


9 


0.00373 


0.17914 


0.18286 


0.01139 


9 


0.00000 


0.07310 


0.07310 


0. 00107 


10 


o*oim 


0.05505 


0.07190 


0.00153 


il 


0.00524 


0. 12446 


0.12970 


0. 00562 


12 


0.08038 


0.15060 


0. 23098 


0.OU71 




0.09604 


0.23585 


0* 33189 


0. 02259 




0.02427 


0. 13104 


0.15530 


0,00983 


n 


0.00931 


0.06794 


0.07723 


0.00239 




0^03337 


0.07078 


0.10414 


0.00313 


\y 


0*03516 


0.26208 


0. 29723 


0. 02037 




0.02276 


0.08172 


0. 10447 


0. 0024** 


19 


0.03147 


O: 04113 


0.07259 


0.00198 


^0 


0^00069 


0.05595 


0.05664 


0.00097 


21 


0.03506 


0.01387 


0.04893 


0. 00086 


?2 


0. 00000 


0.06669 


0.06669 


0.00086 


« "J 


o.ooooc 


0.09227 


0.09227 


0.00212 




0.02801 


0.05072 


0.07873 


0.00147 




0.08245 


0. 17632 


0.25877 


0.0} 737 




0.01752 


0.09615 


0.11367 


0.00302 


2? 


0.0Hi5 


0.05811 


0.13426 


0. 00475 




0- 00000 


0. 11266 


0.132^6 


0.00571 




0.00000 


0*05088 


0.05088 


0. 00049 




0* J95?6 


0.007U 


0. 20286 


0*00986 


31 


0. 10974 


0.08250 


0.19224 


0. 00802 


32 


0.03460 


0.04588 


0.08t47 


0.00196 


13 


0.00000 


0.12379 


0.12379 


0*00353 


3* 


0.06301 


0.06464 


O.U/64 


0*00489 


35 


0.18879 


0.04015 


0*22894 


0.02106 


36 


0.07398 


0.04880 


0.U277 


0.00304 


37 


0.02394 


0.07395 


0.09789 


0. 00225 


3« 


0*00735 


0. 15266 


0.16000 


0.00934 


39 


0.03699 


0.03406 


0.07304 


0*00129 


40 


^.00130 


0* J 9721 


0.^01$! 


0.01013 


41 


0.00000 


0*08442 


0.08442 


0*00203 


41 


0.01800 


0.05713 


0.075 I >' 


0.00139 


43 


0*00000 


0.14337 


0.14337 


O.0O735 


44 


0. 1^963 


00000 


0. 16943 


fn 00773 


4$ 


O.m023 


o^<j3m 


0.056lt 





6H 



Item Bias 
64 



l»*4»v UuH »<4*<*-.l.v^^ Ab»o|yte Hoot $u<h 







0-31292 


0.02448 






0. H^Ofl 


0.0082^ 






0.?5i62 


0.01862 






0» 57353 


0. 061.35 




0. U^9^ 


0.31735 


0,01952 






0.2*1*7 


0.01981 




0.0032? 


0* 05542 


0*00109 




0.03850 


0. 06956 


0*00129 


0. :^^*^s 




0.43373 


0*03832 




0. 12075 


0.12306 


0.00450 




0-06313 


0. 07639 


0.00170 




a.04?3R 


0. 1 79^5 


0.01236 




0.0?l«8 


0.1 282 J 


0. 00438 






0. 22763 


0.02151 






0.25575 


0.01651 




0.01367 


0*03567 


G.00026 




0.08$89 


0. 18599 


0.00854 






0.45379 


0. 04386 




0.3^020 


0. 35020 


0.03319 




0.05720 


0.08749 


0.00238 




0.03S2? 


0. U646 


0.00537 




0. mil 


0. 14665 


0. 00991 


O.Of>S4 ) 


O.o?4«i 


0.09023 


0.00222 




0. 1^4*1 


0. 19308 


0.00927 




0.22280 


0.41315 


0. 03811 




0. 120^? 


0« 27088 


0*01739 




o.o?2oe> 


0.18462 


0.00719 




0.0209! 


0.09821 


0*00249 


0*0^^ ><> 


0-0*1000 


0*09821 


0.00301 




0.02A6f* 


0. 15999 


0.01218 




0.00?36 


0.24972 


0.02777 






0. 32567 


0*0226^ 




0.03221 


0. 15651 


0,00804 




0.00^^3* 


0.21306 


0.01805 




0.08<»V. 


0. 18460 


0.01445 


0. J7^^0 


0.0«^!2? 


0- 24086 


0,02803 


0.00000 




0, 27364 


0.01459 




0.04343 


0»tni4 


0.00389 




0*0009? 


0.19252 


0.0097$ 






0* S^A2!? 


0*iH332 




0.00000 


0.16876 


0.00723 


0. 




0*3t»^l 


0.03129 


O.J 1055 




0* 26905 


0.02452 




0. 12^1? 


0. 3^.928 


0.03381 




0»JO59* 


^» ?57«7 


^mt2f 



Item Bias 
• 65 



M W f> - M H * 







Ha,«4***- Low 




















0, 0«599 


0. 15^42 


0.00 76? 


J! 




0. 00144 


0.24»95^ 


0.021190 


I 




0,24 792 


0. 6H40 


0* 0«003 


ii 






0^ W?9 


0.02909 




0. 1fc:^^ 


0,00000 


0* i62!>> 


0*00697 






0,04^s? 


0.04^5^ 


0,00063 


* 




O,00?0tt 


0,2!«642 


0.0362 7 






ft. OHO! 


0.49S76 


0.091^1 






0,00000 


0,2Bi6> 


0.01736 




0.01033 


0, 13?*10 


0. i4<^42 


o.oos*^^ 


u 




0. 0-0000 


0. 1»2^7 


O*00«24 




0. 


0,00000 




0-004 32 


: 1 






0.1$331 


0* 0049'j 






0,06^9) 


0,0?72l!> 


0,00277 






0. 


0.23 W 


0.026M) 








0. 11905 


0.004 #4 


« ^ 


0.^0? M 




0.5>062 


0.0684^ 






vK0993? 


0, 37021* 


0.03674 




(>.00S*>1 


0. 10992 


0.11^43 


0.00^^86 






0. 009ft « 


0^10221 


0. 0036S 




0. K>A2b 


0.00013 


0. J2438 


O.005>6 






0. "jos^? 




0.0^220 






0> 00000 


0.09579 


0.0024 3 






0. n^OD 


0. 16100 


0.0090S 






0.24987 


\ . 36264 


0»57430 






O*o0?20 


0,046?ft 


o,ooo?i 






O.ooooo 


0, 2^S??^ 


0.01^^05 






0.06000 


0.06244 


o*oon2 








0. ^64^3 


0, 008 S9 






0*00000 




0.00507 


n 




0. 2^93 ? 


0*43216 


0.04074 






0, 1684*t 


0.234^3 


0.01^49 


33 


0«n9?9l 


0. r::*.T2 


0. J2462 


0»OO460 






. 00000 


0* 38261 


0*03769 






0, tun 


0.1 M94 


o.on^^ 


3* 


0.20641 


0.09042 


0,29683 


O.OJ677 


3? 


0.01292: 




0.1967$ 


0*OI6iS 


39 




n.ooooo 


0*llll!»43 


0.OIO09 






o^on*'^ 


0.20132 


0*01506 








0,269^0 


0.OU49 






OtOOOOO 


o*n?io 


0.002f? 




0,3n?T 


3 2*».% 




r. 05? 94 












44 


0.07800 ^ 


0, 04991^ 


0*I2?96 


0.00442 


45 


0» ?D70« 


0.0192!^ 


0*22653 


0,0U16 



67 



Item Bias 



L W 5 - 1. W 6 



.66 











Houi Sum 






Arr;l 










o,onMJ 


o» U3m 


0*004 36 








0.08470 


0.00249 






0*0401^ 


0.10620 


0.00,^34 






0.0f»^«2 


O^O6$05 


0.00168 






0* 00000 


0. 16764 


0. 00650 






0» 2^44 7 


0.29447 


0*02591 






o» !44e>^ 


0. nasi 


0*009^^ 




0, 020 J 1 


0, I03?l 


0* I23S5 


0*00463 






0* IS760 


0.24370 


0.pl30\ 




0. OOOOfy 


0.JIIJ91 


0. 191$^ 


0. 00978 




0,033711 


0,05624 


0*09001 


0. 00245 


i: 


0,093 70 


0, ISbOJ 


0.24970 


0.01661 


1 ) 


0*073n 


0,00121 


0.07436 


0.001A2 




0. 02301! 


0.00442 


Oi 02749 


0. 00040 


I % 




0*001 73 


0.07386 


0*00259 


1 6 


0,G3i79 


0* U096 


0* 14374 


0*00584 


I ? 


0» 00000 


0. lf)8B2 


0^ 16M2 


0*00625 


t >^ 




0. !^ 


0* n625 


0^00492 


1^ 


0,00000 




0. 15163 


Om 00886 


- 0 


0. lf>fyT3 


0^03*>?6 


0-20349 


0*01034 




0,00003 


, 0,0P93? 


0.08939 


0»O024l 






0, 06^79 


0. 0^555 


0.001 79 


* • 




0*03099 


0.03226 


O.OO028 




0. 


0. 03747 
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Table 1 

Number of Students Within Each Subgroup 
Used to Estimate Parameters of 
Item Characteristic Curves 

Income Blacks Whites 

*. 

Grade 5 

Low 2024 2109 

Middle or High 463 2111 

Grade 6 



Low 

Middle or High 



1907 
444 



2028 
2137 



Table 2 

Base Group and Comparison Group 
in Each of the Twelve Pairwise Comparisons 
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Base Group 



Comparison Group 



LW5: 
!,IV>: 
MW5: 



LB:>: 
L8b: 



LW3: 
MW5: 

MWb: 



Grade Level Comparisons 



l*ow income^ white* grade 5 LW6 

Low income, black, grade 5 LB6 

Middle income, white, grade 5 iMW6 

Middle income, black, grade 5 MB6 



Low income, white, grade 6 
Low income, black, grade 6 
Middle income^ white, grade 5 
Middle income, black, grade 5 



Income Comparisons 



Low income, white, grade 5 

Low income, black, grade 5 

Low income, white, grade 6 

Low income, black, grade 6 



MW5: Middle income, white, grade 5 

MBS: Middle income, black, grade 5 

MW6: Middle income, white, grade 6 

MB6: 'Middle income, black, grade 6 



Racial Comparisons 



Low income, white, grade 5 LBS 

Middle income, white, grade 5 MBS 

Low Income, white, grade 6 LB6 

Middle income, white, grade 6 MB6 



Low Income, blacky grade S 
Middle income, black, grade 5 
Low income^ black, grade 6 
Middle income, black, grade 6 



.er|c 
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Table 3 

Expected and Observed Distributions 



of the Number of 


Times an Item is 


Identified 


as Biased 


Based on a Root of 


the 


Sum ot SqUb-es Bias 


Index Greater than 


or Equal to .2 


Number of Times 
Identified 


Expected 
Frequency 


Observed 
Frequency 


4 


.05 


3 


3 


.92 


1 


2 


6.29 


6 


I 


18.48 


7 


0 


19.27 


28 



Expected frequency based on assumption that 13, 
7, 7, and 7 items ?re randomly identified as biased 
in the four independent replications (i.e., LW5-LB5, 
LW6-LB6, MW5-MB5, and Mtf6-11B6). 
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Tab 3.^ 4 

Agreement in the identification of Items 
as Biajied Based on the Square Root of the Sum of Squares 
Greater than .2 for the Pairs of Racial Group Comparisons 



Lb 



CoTOparl s on 
U B 

B b 7 



— 4 



U * 32 ! 
i L 

If 



0 



B 



"9 I 

—4 1 

29 i 3 j 
I,. B 



8 ! 5 

i 

30 j 2 

J, . j 

M6 

U B 



5 



U j 3A I 4 







u 

v.,^ 


B 






3 






L6 


[ : 








35 


3 1 



m 

u B 

I 3 [ 4 1 



M5 



} 



U 1 35 ! 3 ! 

m 



Phi 



67 



27 



.32 



.49 



Chi- Square 



20.38 



3.23 



7/31 



4.69 



10.89 



Notet U * unbiased, B * biased 



Table 5 

Correlations between the Square Root of the 
Sum-of-Squares Bias Indices for Pairs 
of Independent Racial Group Comparisons 



Conparison^ 


L5 


L6 


M5 


M6 


L5 










L6 


,39 








M5 


.47 


.14 






M6 


.21 


.64 


.36 





The comparisons are between racial 
groups within income and grade level. 15 « 
low income, grade 5; L6 » low income, grade 
6; M5 middle income, grade 5; and M6 
middle income > grade 6* 
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Table 6 

Rank Order of Bias Indices for the Three Items Identified 
as Possibly Biased in All Four Comparisons 



Item Comparison 



Index 

Base-High Base-Low Absolute Root Sum 
Area Area Difference of Squares 



^ 3 15 2 6 

L6 3 10 2 4 

M5 3 15 4 4 

M6 4 4 2 3 

25 L5 9 20 9 10 

L6 1 11 1 1 

M5 2 3 2 2 

1 3 1 1 

31 L5 33 2 10 8 

L6 30 1 4 2 

M5 18 1 6 6 

M6 20 2 7 7 
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Table 7 

Rank Order of Base-High and Base-^Low Area Bias Indices 
for the Four Racial Group Comparisons 
Involving Word Meaning and Best Title Items 

Base-High Area Base-Low Area 

Word 

L5 L6 MS M6 L5 L6 M5 M6 

Word Meaning 



2 


there 


12 


5 


8 


12 


41.5 


40 


24.0 


33 


6 


rings 


43 


9 


34 


45 


22.0 


35 


41.5 


23 


13 


rest 


21 


42 


21 


36 


11.0 


5 


12.0 


5 


17 


setting 


20 


10 


5 


3 


4.0 


8 


41.5 


9 


19 


run 


23 


12 


16 


42 


41.5 


42 


41.5 


34 


23 


tribute 


31 


35 


2k 


29 


21.0 


20 


7.0 


40 


25 


character 


9 


1 


2 


1 


20.0 


11 


3.0 


3 


27 


reigning 


2 


6 


14 


10 


41.5 


44 


13.0 


40 


29 


assumed 


24 


37 


33 


24 


31.0 


21 


10.0 


26 


39 


true 


7 


27 


39 


18 


17.0 


22 


33.0 


28 


42 


speculate 


1 


17 


9 


6 


10,0 


30 


2.0 


12 



- - _ - ^ . -A 

Best Title 



Item 
Number 



5 


8 


20 


41 


21 


34.0 


31 


8.0 


40 


11 


18 


23 


27 


19 


36.0 


39 


28.0 


40 


18 


16 


31 


22 




41.5 


27 


41.5 


15 


24 


13 


37 


20 


43 


29.0 


14 


30.0 


10 


31 


33 


30 


18 


20 


2.0 


1 


1,0 


2 
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Figure Captions 

Figure I. Distributions of the square root of the sun of squares bias 
indices for the twelve palrwlae comparisons. 

Figure 2* Item characteristic curves and confidence Intervals for 
fifth- and sixth-grade white students attending schools in low-income 
neighborhoods (Items 6 and IB). 

Figure 3* Item characteristic curves and confidence Intervals for 
fifth- and sixth-grade black students attending schools in low-income 
neighborhoods (Item 35) • 

Figure 4* Item characteristic curves and confidence Intervals for four 
Independent racial group comparisons (Item 3)* 

Figure 5* Item characteristic curves and confidence intervals for four 
Independent racial group comparisons (Item 25) • 

Figure 6* Item characteristic curves and confidence intervals for four 
Independent racial group comparisons (Item 31 )• 

Figure 7* Item characteristic curves and confidence intervals for 
fifth- and sixth-grade white students attending schools in low-income 
neighborhoods (Items 3^ 25, and 31). 

Figure 8* Test characteristic curves for four Independent racial group 
comparisons* 

Figure 9* Expected raw score distributions for iW5 and LB5 students ^ 
%ith « * 0* 
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